data(sources): enrich candidate source list from ChatGPT survey 3#35
Merged
Conversation
Adds 3 new candidate sources and enriches 10 existing candidates with specific URLs, scale data, and access notes surfaced by a prioritised commercial-use source survey (chatgpt_summary_3.md). New sources ─────────── - commons__hebrew_language_manuscripts — Wikimedia Commons parent category (17 subcats + ~105 files: Cairo Geniza, Bible MSS, illuminated MSS, Wellcome, Damascus Pentateuch) - commons__hebrew_calligraphy — Wikimedia Commons (~74 files + subcats; illuminated MSS and ketubot) - openn__judaica_collection_index — OPenn Judaica umbrella index (openn.library.upenn.edu/html/judaica_contents.html); covers Gaster Hebrew MSS and other sub-collections not yet individually tracked Enriched sources ──────────────── - openn__bl_hebrew_manuscripts: landing URL (collection 0032), scale confirmed ~435,000 images, Polonsky Foundation ref added - openn__cairo_genizah_fragments: landing URL (genizah_contents.html) - openn__manchester_hebrew_manuscripts: landing URL (0021.html) + critical caveat — Manchester own viewer is CC BY-NC; use OPenn copy (CC BY 4.0) - openn__katz_center_judaica: landing URL (0002.html) - openn__zucker_ketubah_collection: landing URL (0051.html) - leipzig__hebrew_manuscripts: corrected URL to Leipzig direct page - nypl__hebrew_manuscripts_digital_collections: 1,174 results count added - mdz__hebrew_manuscripts: landing URL + scale (~700 pcs incl. 183 fragments) - archive__hebrew_manuscripts: named high-value items (Leningrad Codex, Aleppo Codex, Cervera Bible, Lailashi Codex, Haverford Masoretic Bible) - huggingface__sivan22_hebrew_handwritten: CC BY 3.0 licence detail, 5,093 rows / 28 classes, added policy-review note (CC-BY-3.0 not explicitly in AGENTS.md accepted list) Validation: ok: 93 sources, 345 entries, 345 files verified, recipe ok Tests: 80 passed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds 3 new candidate source records and enriches 10 existing candidates with
specific sub-collection URLs, scale data, and access notes, based on a
prioritised commercial-use survey of Hebrew manuscript repositories.
New sources
commons__hebrew_language_manuscriptscommons__hebrew_calligraphyopenn__judaica_collection_indexEnriched sources
All existing OPenn candidates now have specific sub-collection landing URLs instead of the generic
openn.library.upenn.edu/:openn__bl_hebrew_manuscriptsopenn__cairo_genizah_fragmentsgenizah_contents.html)openn__manchester_hebrew_manuscriptsopenn__katz_center_judaicaopenn__zucker_ketubah_collectionleipzig__hebrew_manuscriptsmanuscripta-mediaevalia.deto Leipzig's own page (holds the PD rights statement)nypl__hebrew_manuscripts_digital_collectionsmdz__hebrew_manuscriptsarchive__hebrew_manuscriptshuggingface__sivan22_hebrew_handwrittenDocs
docs/sources/chatgpt_summary_3.md— full survey with prioritised ingestion orderingValidation
🤖 Generated with Claude Code